04. Jester Dataset

Jester Dataset

In order to train their gesture recognition system, TwentyBN created the Jester dataset. This dataset consists of 148,092 labeled videos, depicting 25 different classes of human hand gestures. In order to train the system to distinguish between unknown hand movements and specific hand gestures, this dataset also includes a “no gesture” class and a “Doing other things” class.

The videos in the Jester dataset consist of short clips, each about 3 seconds long, of people performing hand gestures in real-world scenarios. These videos were captured with different types of webcams and include different lighting conditions, varying zoom factors, motion blurs, partial occlusions, and background noise. The video below was taken from the Jester dataset and shows a person performing a “Swiping Left” gesture:

Vid4

As you can see in the video above, the videos are meant to capture the complex nuances of real-world conditions such as a cat walking through the scene and having sub-optimal lighting conditions. The great thing about the Jester dataset is that by training the 3D CNN on these real-world scenarios, the 3D CNN is forced to learn the difference between a hand gesture and background noise, such as the cat moving through the scene.